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Abstract - A new family of gallium arsenide circuits for fine grained bit-systolic 
arithmetic arrays is introduced. This scheme combines features of two re- 
cent techniques of dynamic gallium arsenide FET logic and differential dynamic 
single-clock CMOS logic. The resulting circuits are fast and compact, with 
tightly constrained series FET propagation paths, low fanout, no dc power dis- 
sipation, and depletion FET implementation without level shifting diodes. 


1 Introduction 

The advantages of parallel arrays of serial arithmetic processing modules have been recog- 
nized since the 1950s. These arrays were initially seen as a way to speed up signal and 
information processing. With the advances in VLSI technology, it is now possible to easily 
realize these highly modular architectures. 

A number of bit-systolic arithmetic arrays have been developed with the intent of max- 
imizing the clock rate for a given CMOS process, including a 2-D convolver for image pro- 
cessing, an integer polynomial solver and, a finite-field polynomial solver. These arrays 
were modeled using SPICE in a 2-micron (minimum feature size) CMOS process offered 
through MOSIS. Even using this relatively inexpensive low-performance process and worst 
case models, cells were reliably modeled well in excess of 100MHz [9,13]. 

A new dynamic gallium arsenide logic circuit family was proposed in 1991 [1]. These 
circuits, resembling GaAs D-RAM circuits, have several advantages over previously used 
dynamic GaAs logic circuits. Prior dynamic GaAs circuits use both depletion and enhance- 
ment mode devices on the same die and typical process yield is about 50%. A very major 
advantage of the new family of dynamic circuits used for this project is that only depletion 
mode devices are required to realize any function. Using only one type of device should give 
significantly higher process yields. 

A second advantage is the absence of any DC power supply. This characteristic completely 
eliminates the DC power dissipation problems that have severely limited the use of dynamic 
GaAs logic circuits. This characteristic is not without its downside, however, since the clock 
driver must be hefty enough to rapidly charge the capacitors used in the circuits and clock 
skew is critical. 

Another characteristic which sets these circuits apart from other dynamic GaAs circuits 
is their regularity and lack of level shifting diodes, a messy requirement of previous dynamic 
GaAs families. These circuits may be very compact and possess the traits desirable in systolic 
architectures. 

The systolic arithmetic cells described earlier were redesigned using this new dynamic 
GaAs logic. The new circuits were modeled using P-SPICE and process parameters charac- 
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teristic of the Vitesse depletion mode GaAsFET. The models show an order of magnitude 
improvement in clock speed when compared to the CMOS cells described by Winters et.al. 


2 Dynamic GaAs Circuits 

The complexity of GaAs FET VLSI circuits is limited by the maximum power dissipation 
while the uniformity of the device parameters determines the functional yield. For a given 
process yield, the functional yield can vary significantly. The variation is due, in part, to the 
use of different circuit structures that may be operated in either dynamic or static modes. 
Also, the sensitivity of the proper functionality to variations in the process parameters is 
highly dependent on the selection of the circuit structure and the mode of operation. 

Static GaAs circuits are ratioed circuits. This is one of the reasons why their functionality 
and speed of operation are strongly dependent on the variations in the device threshold 
voltage. The fundamental requirement for dynamic circuits is that the ratio of the device 
current in its ON state to that in the OFF state should be sufficiently large (several orders 
of magnitude). This basic requirement for dynamic logic circuits can be met using GaAs 
circuits. 

Dynamic circuits are ratioless, which makes their functionality completely insensitive to 
threshold voltage variations . The dynamic GaAs circuit’s speed is also significantly less 
affected by these variations than in static logic designs. 

The family of dynamic GaAs circuits used for the designs in this project do not dissipate 
DC power. Dynamic circuits have smaller device counts when compared to static circuits 
having similar functionality. These features are very attractive for the implementation of 
ultra high speed VLSI architectures as demonstrated in this paper. 

The evolution of the new GaAs dynamic circuits [1], used in the implementation of the 
systolic arithmetic arrays is an extension of the ideas employed in the JCMOS DRAM Cell 
and the related work on BiCMOS dynamic memory and logic structures [10]-[12]. 

The operation of the new circuits can be easily explained using the diagrams in Figure 
1. An intermediate stage of a dynamic shift register and its idealized functionality are 
illustrated. 




Figure 1: The D-type flip-flop uses depletion mode transistors having a threshold voltage of 
-0.7 volts. 
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During 7\, Ph i 2 = 0 volts. The master section is in the sample phase. The input data is 
stored on C\. C\ is charged to 2 volts or discharged to 0 volts for a logical 1 or 0 respectively. 
C 2 is precharged to approximately 3.5 volts through J 2 (Pfiii = —3.5 volts). 

The slave section is in the evaluation phase. Transistor J3 is in cut off and the drain 
voltage of J 2 , Vd 2 = 0 volts, thus providing a reference voltage for evaluating the data stored 
on C3. If C3 is charged, J 4 is turned off and the precharged capacitor C 4 retains its voltage 
(logic 1). However, if C3 is discharged, J 4 is turned on and C 4 is discharged to represent a 
logic 0. During T 2 , the roles of the master and slave sections are interchanged. 

Simulations of this circuit using a device model which accounts for second order side 
effects and is accurately calibrated to a 1 micron HFET process, verifies the operation of 
the D-Type flip-flop at 2GHz [1]. A comparison with DCFL implementations show that the 
dynamic circuit requires 30% less area and dissipates only 10% of the power (dynamic only) 
of the DCFL flip-flop. 

The basic dynamic circuit can also be used to implement the AND, OR and complex 
logic functions. The operation of the basic circuit is similar to that of the dynamic flip flop. 

Fig. 2 summarizes the operation. When the input is logic 0, the capacitor C is discharged 
during the sampling phase and J 2 will turn on during the evaluation phase. Similarly, if the 
input is a logical 1, the capacitor is charged during the sampling phase causing J 2 to turn 
off during the evaluation phase. 
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Figure 2: Basic dynamic logic circuit 

Figure 3 shows a complex dynamic logic gate. 

Bit-systolic systems are defined here as synchronous digital systems whose combinatorial 
timing paths involve the computation of no more than one bit of data. Moreover, these 
systems are constrained to be locally connected, that is, modules at each level of hierarchy 
are connected exclusively to their nearest neighbors in the physical artwork. Therefore, bit- 
systolic systems are very fined grained pipelined architectures with tightly restricted fanout 
and interconnect capacitance. Such systems may also be said to be bit-extensible, implying 
that a computational word width may be extended to an arbitrary number of bits without 
affecting the clock speed (excluding clock and control signal loading). 

For example, Figure 4 illustrates a bit-systolic serial-parallel multiply-accumulator intro- 
duced in 1990 as a building block for 2-D image convolver and single-instruction multiple 
data path (SIMD) processor arrays [13]. Here, the multiplier, x, is pipelined through n 
stages, and the multiplicand, y, is input in parallel form. The maximum fanout in the multi- 
plier pipeline is 2 gate inputs. Summing is performed in an accumulator pipeline consisting 
of full-adder modules and pipeline delays aligning the product-sum to the multiplier. This 
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Figure 3: The dynamic complex logic gate 


systolic multiplier requires 2n clock cycles to multiply two n-bit unsigned integers. The 
least significant bit of the product, xOyO, appears at the output n clock cycles into the mul- 
tiplication sequence. The module may be easily extended to accommodate signed inputs. 
Operation of the bit-systolic multiplier is illustrated in Table 1. 
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Figure 4: Bit-systolic serial-parallel multiplier 


a<o> 


The useful property of this configuration is that it contains n bits of storage for the 
multiplier and 2n bits for the product. An array of these modules would have the proper 
ratio of operand versus product storage. For instance, a product sum could be accumulated 
by a single multiplier whose output is fed back to its addend input in 2 n clock cycles per 
multiplication. 

The product pipeline can accumulate external addends with its partial products without 
additional adder logic. The external addend is shifted serially into the a input and must 
be pre-shifted n bits into the product pipeline before the LSB of the multiplier is entered. 
Thus, the lower n bits of the addend occupy the high n bits of the product pipeline at the 
beginning of a multiply sequence. Then, n multiplier bits are shifted into the multiplier 
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Table 1: Bit-systolic multiplier operation 
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pipeline, leaving the LSB of the accumulated product in the LSB of the product pipeline. 
During the next n clock cycles, the multiplier is shifted out (replaced by zeroes), the low n 
bits of the product are shifted out of the product pipeline, the high n bits of the product 
are left in the low half of the product pipeline, while the low n bits of the next addend 
are pre-shifted into the high product pipeline half. This is illustrated in Table 2 for a four 
bit Systolic S-P multiplier. Once again, the shaded rows represent the multiplier pipeline. 
The unshaded rows represent the accumulator pipeline, where only the contents of the first 
flip-flops in each bit-cell are shown. 



Table 2: Multiply/ Accumulate Operation 


Each bit-cell of the multiplier module may be constructed from four basic cell types: 
XOR, Carry, Latch, and Product, which will be described later. In the CMOS version, these 
were implemented in differential dynamic circuits that, like the array architecture, were very 
constrained in propagation path, fanout, and connectivity. Mapping the basic differential 
circuit structure to dynamic GaAs circuits was straightforward. 
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3 GaAs Systolic Cells 

The first step in this project was to design the circuits for each cell required. The CMOS 
circuits were available and their performance characteristics well documented [9]. The CMOS 
design required 5 cells; a latch, a product cell, a carry cell and both a P-logic and N-logic 
XOR cell. These cells could then be tiled into arrays with alternating P and N stages. The 
GaAs cells are composed of only depletion mode devices and so only 4 cells were necessary. 
Pipelining in the GaAs arrays is accomplished through the two clocks whereas the CMOS 
relies on the alternating P and N stages with a single clock signal. Figures 5 and 6 show the 
CMOS circuits and the corresponding GaAs analogs respectively. 



Figure 5: CMOS logic cells for systolic arrays 

A quick comparison of the CMOS and GaAs shows that the device count (including 
capacitors) is quite comparable. The CMOS process offered by MOSIS is a 2 pm (minimum 
feature size) 2 metal layer process. MOSIS offers a .7/xm, 3 metal layer GaAs process 
(Vitesse). The smaller feature size along with an additional metal layer should equate to 
more densely packed circuits and hence reduced parasitic capacitances. Smaller parasitics, 
coupled with gallium-arsenide’s higher charge mobility, mean that the cells should perform 
at a significantly faster clock speed than the CMOS circuits. 
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4 Simulation 

The GaAs circuits were simulated using P -SPICE (professional version). Since a circuit 
layout was not implemented, the parasitic capacitances were conservatively estimated. All 
of the GaAs circuits were simulated at 1 and 2 GHz and all of the cells showed reliable 
operation at 2 GHz. 


5 Conclusions 

The speed of Gallium Arsenide technology implemented in differential dynamic circuits in 
bit-systolic arrays would enable very high performance solutions to a broad range of problems. 
The basic cell architectures themselves have been tested and their performance verified in 
CMOS [9]. The GaAs cells have the same functionality as the CMOS cells, however, since 
no layout of the circuits has been done, the performance evaluation is preliminary and 
conservative. It is estimated that, given a good layout , these cells would perform at least 2 
times faster than the models used for this paper. 


GaAs Carry Ceil (2GHz.) 


Date/Time run: 05/20/92 14:25:14 


Temperature: 27.0 
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